-
Notifications
You must be signed in to change notification settings - Fork 300
fill_value and dtype #2433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fill_value and dtype #2433
Conversation
a1e5743 to
338f8d9
Compare
| @dtype.setter | ||
| def dtype(self, dtype): | ||
| self._dtype = dtype | ||
| if dtype != self.dtype: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not always allow cube.dtype = None, to reset the dtype to the underlying data ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the case that you might want to unset the dtype for lazy data to None, which you can't at the moment given the current implementation ... I've no argument against that.
|
Hi @bjlittle highly impressed you have this passing now !! In meantime, here's an idea to ponder on... I think we will need to retain real-data reassignment "cube.data =", but we can probably do without lazy-data reassignment (I need to check out existing code). Even if not, we can define a particular function for this. Think on, will discuss later. |
Ran 4006 tests in 104.584s
FAILED (SKIP=310, failures=38)Test case failures by category - tick the box to own the failures, investigate and fix, push changes as a PR on this branch and strike out the category when done ...
|
Keep cube's fill_value when resetting the cube's data in interpolation test
|
Sorry @bjlittle! I accidentally pushed (still getting the hang of pycharm) the change You can always revert it if you diagree with it |
Keep fill values during stats operations
Fix concatenate tests.
Fix basic maths tests.
lib/iris/cube.py
Outdated
| # as the fill_value is checked against self.dtype. | ||
| self._dtype = None | ||
| if dtype is not None: | ||
| self.dtype = dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to have this check here, we just need to initialise _dtype = None and then set self.dtype = dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
| dtype = np.dtype(dtype) | ||
| if dtype.kind != 'i': | ||
| emsg = ('Can only cast lazy data to integral dtype, ' | ||
| 'got {!r}.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be strict here. We're only allowing the dtype to be of integral type to support the masked case. If we didn't do this, then users would abuse this to perform casting of their data.
If we do want to support casting of data, for the sake of it, then we should support that properly, say with cube.astype(...) or whatever. Hence, that's why we nail this dtype change down to only being applied to the lazy, integral (masked) data case. HTH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: if you're visiting this bit, I've found that this test should also be allowing 'u' type data, i.e. unsigned ints. E.G. if dtype.kind not in ('i', 'u'): ?
If not it can wait -- I think it could do with a more general code search anyway, as I think we may have missed this possibility elsewhere in the code too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm good point @pp-mo ... See this concatenate change and here also ... is there a generic numpy kind to catch all integral kinds instead of i and u ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This numpy scalars link was handy ... also see Array-protocol type strings section of numpy docs ...
And this pattern seems to work:
>>> integral = [np.arange(10, dtype=np.int8),
... np.arange(10, dtype=np.int16),
... np.arange(10, dtype=np.int32),
... np.arange(10, dtype=np.int64),
... np.arange(10, dtype=np.uint8),
... np.arange(10, dtype=np.uint16),
... np.arange(10, dtype=np.uint32),
... np.arange(10, dtype=np.uint64)]
>>> for i in integral:
... print(isinstance(i[0], np.integer))
...
True
True
True
True
True
True
True
True
>>> f = np.arange(10, dtype=np.float)
>>> isinstance(f[0], np.integer)
FalseDon't know if it's preferable over value.dtype.kind in ('i', 'u') though ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're only allowing the dtype to be of integral type to support the masked case
It might be worth putting a comment in the code about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only mentioned the kind=u/i thing for completeness..
Let's not get hung up on this, please.
It will be much simpler to fix it later as I'm sure it needs doing in unrelated bits of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do prefer np.integer.
It will also need to be fixed in iris/_concatenate.py e.g. https://github.com/SciTools/iris/pull/2433/files#diff-5fa4649482766af96ae11aead017e334R340
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pp-mo can you make an issue for this please
| if kwargs['dtype'] is None and self.data_type.kind == 'i': | ||
| kwargs['dtype'] = self.data_type | ||
| defn = iris.cube.CubeMetadata(**kwargs) | ||
| return defn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have some unit tests for this really ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm not going to do that here as this might all change given conversations with @pp-mo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created the following issue #2439 for this
| def fill_value(self, fill_value): | ||
| if fill_value is not None: | ||
| # Convert the given value to the dtype of the cube. | ||
| fill_value = np.asarray([fill_value])[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
fill_value, = np.asarray([fill_value])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tending to steer away from the fill_value, = np.asarray([fill_value]) pattern to unpack a scalar value, as it's quite subtle i.e. it's pretty easy for the eye to miss.
However, we could opt for [fill_value] = np.asarray([fill_value]) instead ... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make this change ...
lib/iris/tests/test_concatenate.py
Outdated
| result = concatenate(cubes) | ||
| self.assertEqual(len(result), 2) | ||
|
|
||
| def test_masked_fill_value(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder whether "test_masked_diff_fill_value" would be better
pp-mo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just make this obscure stuff a bit clearer ?
I'm otherwise happy with the code.
I must just whizz through the test changes, too, but I'm not expecting to find anything much to complain of..
| return result | ||
|
|
||
| def promote_defn(self): | ||
| defn = self.defn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is worthy of explanation !
Suggest:
- call it "lazy_version_defn"
- docstring like ....
Produce the defn (i.e. cube.metadata) which a lazy version of the cube would have.
A cube with lazy data representing masked ints must have its `metadata.dtype` set appropriately.
Even if we think this may all change again later ...
| # in :meth:`iris.cube.CubeList.concatenate_cube()`. | ||
| msg = 'Cube metadata differs for phenomenon: {}' | ||
| msgs.append(msg.format(self.defn.name())) | ||
| promoted = self.promote_defn() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain here too. Something like "Check whether lazy versions of the defns would match."
|
The tests are failing due to pep8 @bjlittle |
|
Once this tests are passing I'll merge this in |
|
Are you happy for me to squash and merge (once the tests have passed)? |
|
Ping @lbdreyer 👍 |
|
Ping @pp-mo @dkillick |
|
Sorry about the wait. @pp-mo wanted to do some final checks |
* Testing ideas for cube data/dtype/fill_value interaction (not passing -- need new API). * Add checks for realisation; fix dtype tests; use 'lazy' keyword instead of 'real'. * With test failures to investigate. * Fixed tests for new code from SciTools#2433. * Fix tests * Pep8 fix. * Fixed data_dtype_fillvalue test fails (other things now failing). * Fix bug in clearing _dask_array when assigning real data. * Keep fill value of cube when resetting the cube's data * Fix concatenate tests. * Keep fill values during stats operations; cast cml fill values to float64. * Keep fill values during regrid test. * Fix basic maths tests. * Tidy cube.__init__ for dtype setting. * Review actions. * Rename concatenate unit tests. * Review actions. * pep8
This PR addresses the issue of dealing with the
dtypeandfill_valueof the cube.We need to deal with fact that
daskdoes not support masked arrays, and so we need to take care to correctly handle the intendeddtypeandfill_valueof the data payload in the cube.The
dtypeandfill_valuehave now been promoted to be cube metadata attributes, because we care about them and define what a cube is. In the case of adasklazy, masked data payload, we need to preserve the case where the intendeddtypeis integral.We also have a shift in how
fill_valueis handled, in that previously we handed that off tobiggusandnumpy.main the hope that it would do the right thing. Now we deal with this directly within the cube throughcube.fill_value, which is used as the intendedfill_valueof any masked data payload.